Dissertation an Echo State Model of Non-markovian Reinforcement Learning
نویسنده
چکیده
OF DISSERTATION AN ECHO STATE MODEL OF NON-MARKOVIAN REINFORCEMENT LEARNING There exists a growing need for intelligent, autonomous control strategies that operate in real-world domains. Theoretically the state-action space must exhibit the Markov property in order for reinforcement learning to be applicable. Empirical evidence, however, suggests that reinforcement learning also applies to domains where the state-action space is approximately Markovian, a requirement for the overwhelming majority of real-world domains. These domains, termed non-Markovian reinforcement learning domains, raise a unique set of practical challenges. The reconstruction dimension required to approximate a Markovian state-space is unknown a priori and can potentially be large. Further, spatial complexity of local function approximation of the reinforcement learning domain grows exponentially with the reconstruction dimension. Parameterized dynamic systems alleviate both embedding length and state-space dimensionality concerns by reconstructing an approximate Markovian state-space via a compact, recurrent representation. Yet this representation extracts a cost; modeling reinforcement learning domains via adaptive, parameterized dynamic systems is characterized by instability, slow-convergence, and high computational or spatial training complexity. The objectives of this research are to demonstrate a stable, convergent, accurate, and scalable model of non-Markovian reinforcement learning domains. These objectives are fulfilled via fixed point analysis of the dynamics underlying the reinforcement learning domain and the Echo State Network, a class of parameterized dynamic system [30]. Understanding models of non-Markovian reinforcement learning domains requires understanding the interactions between learning domains and their models. Fixed point analysis
منابع مشابه
On using discretized Cohen-Grossberg node dynamics for model-free actor-critic neural learning in non-Markovian domains
We describe how multi-stage non-Markovian decision problems can be solved using actor-critic reinforcement learning by assuming that a discrete version of CohenGrossberg node dynamics describes the node-activation computations of a neural network (NN). Our NN (i.e., agent) is capable of rendering the process Markovian implicitly and automatically in a totally model-free fashion without learning...
متن کاملHuman learning in non-Markovian decision making
Humans can learn under a wide variety of feedback conditions. Particularly important types of learning fall under the category of reinforcement learning (RL) where a series of decisions must be made and a sparse feedback signal is obtained. Computational and behavioral studies of RL have focused mainly on Markovian decision processes (MDPs), where the next state and reward depends only on the c...
متن کاملNon-Markovian Control with Gated End-to-End Memory Policy Networks
Partially observable environments present an important open challenge in the domain of sequential control learning with delayed rewards. Despite numerous attempts during the two last decades, the majority of reinforcement learning algorithms and associated approximate models, applied to this context, still assume Markovian state transitions. In this paper, we explore the use of a recently propo...
متن کاملTotally Model-Free Reinforcement Learning by Actor-Critic Elman Networks in Non-Markovian Domains
In this paper we describe how an actor critic rein forcement learning agent in a non Markovian domain nds an optimal sequence of actions in a totally model free fashion that is the agent neither learns transitional probabilities and associated rewards nor by how much the state space should be augmented so that the Markov prop erty holds In particular we employ an Elman type re current neural ne...
متن کاملMemory Approaches To Reinforcement Learning In Non-Markovian Domains
Reinforcement learning is a type of unsupervised learning for sequential decision making. Q-learning is probably the best-understood reinforcement learning algorithm. In Q-learning, the agent learns a mapping from states and actions to their utilities. An important assumption of Q-learning is the Markovian environment assumption, meaning that any information needed to determine the optimal acti...
متن کامل